This tutorial is meant to be a step-by-step guide for likelihood-based assignment methods, like those used in Hobson et al. 2009, van Wilgenburg et al. 2011, Vander Zanden et al. 2014, and Kusack et al. 2022. Using this code, you will be able to probabilistically assign individuals to origin by comparing stable-isotopes values of tissues to established spatial surfaces for stable isotope values within precipitation.
We will mainly use functions within the AssignR package (see Ma et al. 2020), but mention some functions within the isocat package (see Campbell et al. 2020. Our approach is similar to the overall workflow, but with a few changes.
For this tutorial we provide all of the necessary data. All other data will be generated within R, including the isotope data for the assigned individuals.
1 - Load packages
The following packages are required to run this Rmd script. They will automatically install and load if you do not currently have them installed.
if (!require('assignR')) install.packages('assignR'); library('assignR')
if (!require('raster')) install.packages('raster'); library('raster')
if (!require('terra')) install.packages('terra'); library('terra')
if (!require('sf')) install.packages('sf'); library('sf')
if (!require('sp')) install.packages('sp'); library('sp')
if (!require('rasterVis')) install.packages('rasterVis'); library('rasterVis')
if (!require('rnaturalearth')) install.packages('rnaturalearth'); library('rnaturalearth')
if (!require('dplyr')) install.packages('dplyr'); library('dplyr')
if (!require('RColorBrewer')) install.packages('RColorBrewer'); library('RColorBrewer')
if (!require('ggplot2')) install.packages('ggplot2'); library('ggplot2')
if (!require('plotly')) install.packages('plotly'); library('plotly')
if (!require('rdryad')) install.packages('rdryad'); library('rdryad')
custom.palette <- colorRampPalette(brewer.pal(9, "YlGnBu")) # Color palette2 - Isotope data
For this tutorial, we will use data collected during my PhD thesis on American Black Ducks (see link for species description). These data were collected from hunters, via the parts collection and species composition surveys, from across their range in eastern North America during hunting season (Sep-Jan). From these ducks, we have primary feather stable-hydrogen values (\(\delta\)2Hf; relative to the Vienna Standard Mean Ocean Water, VSMOW). These data are published and publicly available here, but we won’t repeat the analysis from that paper.
Based on timing of capture and knowing the moult schedule for black ducks, we know that these feathers were grown on the breeding grounds or moulting sites and we can use these feathers to assign individuals to breeding/moult origin. For your own data, it is important to understand moult timing and life history of your species to know exactly where the tissues were grown.
duck.d2h <- read.csv(rdryad::dryad_files_download(1697805)[[1]]) # Download data directly from Dryad
glimpse(duck.d2h) # Check data structureRows: 664
Columns: 23
$ id.lab <chr> "UWO10300", "UWO10301", "UWO10302", "UWO10303", "UWO10304…
$ id.full <chr> "A192066", "M274723", "M274758", "M286504", "M317005", "M…
$ VSMOW <dbl> -126.0, -125.9, -110.5, -117.1, -113.2, -103.6, -97.2, -1…
$ county.id <chr> "MI Tuscola", "MI Iosco", "MI Iosco", "WI Green Lake", "M…
$ species <chr> "ABDU", "ABDU", "ABDU", "ABDU", "ABDU", "ABDU", "ABDU", "…
$ country <chr> "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "…
$ season <chr> "2017-2018", "2017-2018", "2017-2018", "2017-2018", "2017…
$ state.prov <chr> "MI", "MI", "MI", "WI", "MI", "MI", "WI", "IL", "IL", "OH…
$ county <chr> "Tuscola", "Iosco", "Iosco", "Green Lake", "Benzie", "Ion…
$ lat.dec <dbl> 43.46972, 44.36015, 44.36015, 43.80162, 44.63980, 42.9500…
$ long.dec <dbl> -83.41717, -83.64132, -83.64132, -89.04051, -86.01980, -8…
$ age <chr> "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I…
$ sex <chr> "F", "M", "M", "F", "M", "F", "M", "F", "F", "M", "F", "F…
$ mm <int> 11, 10, 11, 10, 10, 10, 11, 12, 11, 12, 12, 11, 10, 12, 1…
$ dd <int> 8, 11, 2, 26, 10, 28, 13, 26, 25, 31, 3, 28, 28, 31, 19, …
$ yy <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
$ ABDU_region <chr> "interior", "interior", "interior", "interior", "interior…
$ county.lat <dbl> 43.46972, 44.36015, 44.36015, 43.80162, 44.63980, 42.9500…
$ county.long <dbl> -83.41717, -83.64132, -83.64132, -89.04051, -86.01980, -8…
$ VPDB <dbl> NA, NA, NA, NA, NA, NA, -25, NA, NA, NA, NA, NA, NA, NA, …
$ AIR <dbl> NA, NA, NA, NA, NA, NA, 8.5, NA, NA, NA, NA, NA, NA, NA, …
$ date <int> 312, 284, 306, 299, 283, 301, 317, 360, 329, 365, 337, 33…
$ hunting.date <int> 68, 40, 62, 55, 39, 57, 73, 116, 85, 121, 93, 88, 57, 121…
To make this more manageable, we will just select juvenile birds harvested in Ontario.
duck.d2h <- filter(duck.d2h, age == "I") %>% # Select the juveniles
filter(state.prov == "ON") %>% # select the birds harvested in Ontario
select(id.lab, VSMOW, long.dec, lat.dec)
nrow(duck.d2h) # sample size[1] 41
So that leaves us with 41 juvenile black ducks harvested in Ontario across 2 hunting seasons (2017-2018, 2018-2019).
3 - Shapefiles
Before we start working with isoscapes, we should ensure that all the necessary polygons for the assignment are loaded. These polygons are: North America and a breeding range. For the North America polygon, we will use the rnaturalearth package to load a map of the countries/states within North America.
For the breeding range, I have provided a shapefile hosted on github (https://github.com/JacksonKusack/Isotope-Assignment-Workshop). Other easily accessible options exist for these breeding ranges for birds, such as BirdLife International or eBird which both require a simple request for access but provide the data quickly.
To be safe - we will reproject them so that they are all in the correct coordinate system (CRS). If you are unfamiliar with coordinate systems and projections in R, I won’t go through them here. Here is a quick overview. Most of the isoscape data that we are working with uses the WGS84 coordinate system (EPSG 4326). This system uses latitude and longitude coordinates on the WGS84 reference ellipsoid. This is the coordinate system commonly used by organizations that provide GIS data for the entire globe or many countries.
northamerica <- ne_countries(continent = "North America", scale = 50, returnclass = "sf") %>% # Countries
st_transform(st_crs('EPSG:4326')) # Project to WGS84
northamerica.states <- ne_states(country = c("Canada","United States of America"), returnclass = "sf") %>% # States and provinces
st_transform(st_crs('EPSG:4326')) # Project to WGS84
breeding.range <- st_read("/vsicurl/https://github.com/JacksonKusack/Assignment-Workshop/raw/main/Shapefiles/ABDU_baldassarre_lowdens.shp") %>% # Load a shapefile from GitHub
st_transform(st_crs('EPSG:4326')) # Project to WGS84Reading layer `ABDU_baldassarre_lowdens' from data source
`/vsicurl/https://github.com/JacksonKusack/Assignment-Workshop/raw/main/Shapefiles/ABDU_baldassarre_lowdens.shp'
using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -96.86612 ymin: 34.58858 xmax: -52.61946 ymax: 60.57192
Geodetic CRS: NAD83
Here is what that range looks like:
plot(st_geometry(breeding.range), col = brewer.pal(9, "YlGnBu")[5], border = NA)
plot(st_geometry(northamerica), add = T)If you are unfamiliar with spatial data in R, the three data types that we are working with in this document are (1) polygons (i.e., sequence of two-dimensional points forming a closed non-self-intersecting shape), (2) points (i.e., series of zero-dimensional points) and (3) rasters (i.e., grid of cells storing values, all of which are the same size).
For this tutorial, we will use the terra and sf packages for most functions and data manipulation, but may have to switch back to sp or raster R objects, because the functions necessitate the older versions. Moving forward, terra and sf will be the supported standards, but until all packages adopt these formats, we have to work around them. So, we will load/index/transform the data in sf format but convert to sp temporarily using the as(x, “Spatial”) function.
Now we have the shapefiles loaded, let’s see where these black ducks were harvested.
duck.points <- vect(duck.d2h, geom = c("long.dec", "lat.dec")) # create vect (points) object
plot(duck.points, col = 'red', pch = 17)
plot(st_geometry(northamerica.states), add = T)Mostly, around southern Ontario, but some in central and western Ontario. Note some fall outside of Ontario, because these points are based on hunter estimated locations (i.e., _____ km east of _____ town), but this won’t affect these methods becuase we won’t use the harvest location here.
4 - Isoscape
Next we will download an amount-weighted growing season \(\delta\)2H precipitation isoscape (see Bowen et al. 2005 for methods), from the AssignR package. This function will save a version of this surfaces locally on your harddrive each time it’s run. This surface provides four layers: a mean and sd surface for \(\delta\)2H values and \(\delta\)18O values. For this workshop, we’ll just use the \(\delta\)2H values surfaces.
gsd <- getIsoscapes(isoType = "GlobalPrecipGS", timeout = 1200) %>%
projectRaster(crs = CRS(SRS_string = 'EPSG:4326'))Refer to C:\Users\jacks\AppData\Local\Temp\Rtmp6T5Qb5\metadata.txt for
documentation and citation information
gsd <- gsd[[1:2]] # Pull out the d2H surfaces (1 and 2)Visualizing this isoscape and error surface, we can see the we have data on hydrogen isotopes within precipitation for much of the globe! Note the error is much higher near the poles, which makes the error in other areas difficult to interpret. When we mask these later, it will be easier to see.
plot(gsd, xlab="Longitude", ylab="Latitude", col = custom.palette(16))4.1 - Calibration
Now we have the isoscape, we need to calibrate the mean \(\delta\)2Hp values to tissue relevant values. This can be done because of predictable relationships between stable-hydrogen values within feathers (\(\delta\)2Hf) and stable-hydrogen values within precipitation (\(\delta\)2Hp) at the location of feather growth (see Hobson et al. 2012).
At this point we could limit the isoscape to our breeding range, but we may have calibration data (i.e., known-origin tissues) from another species that could be found outside this range. Therefore, I find it easier to do this step on the global isoscape surface. This takes a bit longer, but it’s only a minor difference.
AssignR provides some raw known-origin data (i.e., location of known-origin tissues and their \(\delta\)2H values) stored the knownOrig dataset. At the time this was written, there were data from 18 publications stored in the database and 4062 unique known-origin samples. Each individual has additional metadata, but the available data isn’t complete in some cases. When looking at this data, be careful to ensure that the data are appropriate for your study and, as an extra step, match the original publication.
data("knownOrig")
knownOrig$sources[1:2] # list the dataset names and ids Dataset_ID Dataset_name
1 1 Hobson et al. 1999 Oecologia
2 2 Hobson et al. 2012 Plos
3 3 Hobson and Wassenaar 1997 Oecologia
4 4 Clark et al. 2006 Can J Zool
5 5 Hobson et al. 2004 Oecologia
6 6 Lott and Smith 2006 Auk
7 7 Hobson and Kohler 2015 Ecol Evol
8 8 Thompson et al. 2010 Am J Phys Anthropol
9 9 Bowen et al. 2009 Am J Phys Anthropol
10 10 Ehleringer et al. 2008 PNAS
11 11 Wunder Plover
12 12 van Dijk et al. 2014 J Avian Biol
13 13 Neto et al. 2006 J Avian Biol
14 16 Magozzi Towhee
15 17 Prochazka et al. 2013 J Avian Biol
16 18 Langin et al. 2007 Oecologia
17 19 Bataille Canadian Human Hair
18 20 Hobson et al. 2019 FEE
Lets see what species are available:
unique(knownOrig$samples$Taxon) [1] "Danaus plexippus" "Setophaga ruticilla"
[3] "Turdus migratorius" "Setophaga coronata auduboni"
[5] "Poecile atricapillus" "Thryomanes bewickii"
[7] "Thryothorus ludovicianus" "Spizella passerina"
[9] "Geothlypis trichas" "Setophaga pensylvanica"
[11] "Baeolophus bicolor" "Vermivora chrysoptera"
[13] "Catharus guttatus" "Setophaga citrina"
[15] "Geothlypis formosa" "Geothlypis tolmiei"
[17] "Oreothlypis ruficapilla" "Cardinalis cardinalis"
[19] "Oreothlypis celata" "Junco hyemalis oregonus"
[21] "Seiurus aurocapilla" "Vireo olivaceus"
[23] "Melospiza melodia" "Catharus ustulatus"
[25] "Catharus fuscescens" "Vireo griseus"
[27] "Cardellina pusilla" "Hylocichla mustelina"
[29] "Icteria virens" "Setophaga petechia"
[31] "Melozone aberti" "Vermivora cyanoptera"
[33] "Passer domesticus" "Aimophila ruficeps"
[35] "Poecile carolinensis" "Troglodytes aedon"
[37] "Dumetella carolinensis" "Mniotilta varia"
[39] "Lanius ludovicianus" "Anthus spragueii"
[41] "Euphagus carolinus" "Empidonax minimus"
[43] "Oreothlypis peregrina" "Aythya affinis"
[45] "Cyanistes caeruleus" "Phasianus colchicus"
[47] "Lagopus lagopus" "Tetrao tetrix"
[49] "Dryocopus maritus" "Serin serin"
[51] "Vanellus vanellus" "Corvus corone"
[53] "Turdus merula" "Corvus monedula"
[55] "Columba palumbus" "Turtle Dove"
[57] "Tetrastes bonasia" "Perdix perdix"
[59] "Anas platyrhyncos" "Branta canadensis"
[61] "Columba livia" "Numenius arguata"
[63] "Turdus pilaris" "Turdus iliacus"
[65] "Turdus philomelos" "Fringilla coelebs"
[67] "Buteo lagopus" "Accipiter striatus"
[69] "Falco sparverius" "Accipiter gentillis"
[71] "Accipiter cooperii" "Buteo jamaicensis"
[73] "Buteo platypterus" "Buteo swainsoni"
[75] "Circus cyaneus" "Falco columbarius"
[77] "Falco mexicanus" "Falco perigrinus"
[79] "Wilsonia citrina" "Oporornis tolmiei"
[81] "Wilsonia pusilla" "Homo sapiens"
[83] "Charadrius montanus" "Anas platyrhynchos"
[85] "Locustella luscinioides" "Acrocephalus arundinaceus"
[87] "Acrocephalus scirpaceus" "Pipilo maculatus"
Looks like there are some data for Mallard, which we can use to calibrate the isoscape for our black duck data. One option would be to use a single study that looked at Mallard, such as van Dijk et al. 2014. For that we can use the function subOrigData(…). We can specify the dataset, by number, and which marker we want. For van Dijk et al. 2014, the dataset is #12.
cal.mall <- subOrigData(marker = 'd2H', dataset = 12, ref_scale = NULL) # Extract data215 samples are found from 39 sites
cal.mall$data <- spTransform(cal.mall$data, CRS(SRS_string = 'EPSG:4326')) # reproject the dataThese datasets provide information on the source of the data (i.e., lab methods and publication information) and the data itself (e.g., tissue isotopes, age, taxon, site), as well as other metadata. For this data to work with the below functions, we need to keep it in this format, but we can always modify the data portion directly (as we did with the projection above).
str(cal.mall)List of 4
$ data :Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
.. ..@ data :'data.frame': 215 obs. of 19 variables:
.. .. ..$ Site_ID : num [1:215] 1543 1543 1543 1543 1543 ...
.. .. ..$ Site_name : chr [1:215] NA NA NA NA ...
.. .. ..$ State : chr [1:215] NA NA NA NA ...
.. .. ..$ Country : chr [1:215] NA NA NA NA ...
.. .. ..$ Site_comments : num [1:215] NA NA NA NA NA NA NA NA NA NA ...
.. .. ..$ Sample_ID : num [1:215] 2722 2723 2724 2725 2726 ...
.. .. ..$ Sample_ID_orig : chr [1:215] NA NA NA NA ...
.. .. ..$ Dataset_ID : num [1:215] 12 12 12 12 12 12 12 12 12 12 ...
.. .. ..$ Taxon : chr [1:215] "Anas platyrhynchos" "Anas platyrhynchos" "Anas platyrhynchos" "Anas platyrhynchos" ...
.. .. ..$ Group : chr [1:215] "Water bird" "Water bird" "Water bird" "Water bird" ...
.. .. ..$ Source_quality : chr [1:215] "known source" "known source" "known source" "known source" ...
.. .. ..$ Age_class : chr [1:215] NA NA NA NA ...
.. .. ..$ Material_type : chr [1:215] "Feather" "Feather" "Feather" "Feather" ...
.. .. ..$ Matrix : chr [1:215] "Keratin" "Keratin" "Keratin" "Keratin" ...
.. .. ..$ d2H : num [1:215] -134 -135 -142 -132 -132 ...
.. .. ..$ d2H.sd : num [1:215] 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 ...
.. .. ..$ d18O : num [1:215] NA NA NA NA NA NA NA NA NA NA ...
.. .. ..$ d18O.sd : num [1:215] NA NA NA NA NA NA NA NA NA NA ...
.. .. ..$ Sample_comments: num [1:215] NA NA NA NA NA NA NA NA NA NA ...
.. ..@ coords.nrs : num(0)
.. ..@ coords : num [1:215, 1:2] 27.4 27.4 27.4 27.4 27.4 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:215] "2722" "2723" "2724" "2725" ...
.. .. .. ..$ : chr [1:2] "Longitude" "Latitude"
.. ..@ bbox : num [1:2, 1:2] -8.68 40.69 39.32 63.15
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "Longitude" "Latitude"
.. .. .. ..$ : chr [1:2] "min" "max"
.. ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs"
.. .. .. ..$ comment: chr "GEOGCRS[\"WGS 84 (with axis order normalized for visualization)\",\n ENSEMBLE[\"World Geodetic System 1984 e"| __truncated__
$ sources:'data.frame': 1 obs. of 17 variables:
..$ Dataset_ID : num 12
..$ Dataset_name : chr "van Dijk et al. 2014 J Avian Biol"
..$ Citation : chr "van Dijk JGB, Meissner W, Klaassen M. 2014. Improving provenance studies in migratory birds when using feather "| __truncated__
..$ Sampling_method : chr NA
..$ Sample_powdered : chr NA
..$ Lipid_extraction : chr "Y"
..$ Lipid_extraction_method: chr "2:1 chloroform:methanol"
..$ Exchange : chr "Y"
..$ Exchange_method : chr "Comparative eqib"
..$ Exchange_T : chr "Ambient"
..$ H_cal : chr "OldEC.1_H_1"
..$ O_cal : chr NA
..$ Std_powdered : chr "Y"
..$ Drying : chr "N"
..$ Analysis_method : chr "pyrolysis; CF-IRMS"
..$ Analysis_type : chr "H"
..$ Source_comments : num NA
$ chains : NULL
$ marker : chr "d2H"
- attr(*, "class")= chr "subOrigData"
AssignR also provides methods to transform the \(\delta\)2H values into different reference scales. For details on these methods sees Magozzi et al. 2021. Simply, this allows us to combine data from different labs, if they use different keratin standards on different scales. If the ref_scale is changed though, you need to ensure that the sample data are also converted to this scale. If everything was done in the same lab or the two labs used the same methods, then you can set this to NULL.
Example:
test.transformed <- subOrigData(marker = 'd2H', dataset = 12, ref_scale = "VSMOW_H")215 samples are found from 39 sites
215 samples from 39 sites in the transformed dataset
mean(cal.mall$data$d2H) # Untransformed mean[1] -104.7414
mean(test.transformed$data$d2H) # Transformed mean[1] -78.0453
Luckily for us, the isotope data from van Dijk et al. 2014 were done using keratin standards on the same scale as the sample black duck data (e.g., OldEC.1_H_1). This is the data that we will use to run the calibration.
We can calibrate the gsd isoscape and produce a new isoscape where values are predicted feather values (\(\delta\)2Hf). To do this, we can use the calRaster(…) function, providing it with our known origin data (cal.mall) and precipitation isoscape (d2h_world).
r <- calRaster(known = cal.mall, isoscape = gsd, interpMethod = 1, verboseLM = F, genplot = F) # Calibrate the mean and sd values from our isoscape
r.model <- r$lm.model # Extract model results
summary(r.model)
Call:
lm(formula = tissue.iso ~ isoscape.iso[, 1], weights = tissue.iso.wt)
Residuals:
Min 1Q Median 3Q Max
-34.267 -9.975 -0.905 9.084 39.898
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -37.91827 3.42894 -11.06 <2e-16 ***
isoscape.iso[, 1] 1.17645 0.05909 19.91 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.11 on 213 degrees of freedom
Multiple R-squared: 0.6504, Adjusted R-squared: 0.6488
F-statistic: 396.3 on 1 and 213 DF, p-value: < 2.2e-16
p <- ggplot(data = r$lm.data, aes(y = tissue.iso, x = isoscape.iso)) +
geom_point() +
stat_smooth(method = "lm", formula = 'y ~ x') +
theme_classic()
ggplotly(p)This calibration relationship produces the following linear equation (i.e., calibration equation):
\[\delta^2H_f = 1.18 * \delta^2H_p - 37.92\]
First, you might notice that the produced calibration equation doesn’t exactly match the one in in van Dijk et al. 2014:
\[\delta^2H_f = 1.36 * \delta^2H_p - 21.9\]
This is because the amount-weighted growing season \(\delta\)2H precipitation isoscape has been updated with additional years of precipitation data since publication. And AssignR doesn’t contain all of the data in this case. But the first point would still apply if all the data were available.
The mean surface has been updated using this calibration relationship, but the error surface has also been modified using the calibration equation. For the specific methods see see Ma et al. 2020), but the general idea here is the resulting value is incoporates the standard devation of ()2H values at each cell, the standard devation of residuals within the calibration equation, and the covariance between the two measures of variation.
Now we have a new isoscape object r which again is a rasterstack containing the calibrated mean and error surfaces.
plot(r$isoscape.rescale, xlab="Longitude", ylab="Latitude", col = custom.palette(16))To limit this isoscape to the breeding range of the species of interest we then apply a mask function. We could use the mask argument within the calRaster(…) function, but the ‘mask’ argument does not use the actual polygon to limit the isoscape, it uses the extent/bounding box. So, we can use the functions within the terra package to manually apply the mask.
r$isoscape.rescale <- mask(rast(r$isoscape.rescale), breeding.range) %>% # Mask to breeding range
crop(extent(breeding.range)) %>% # Crop to breeding range
raster::stack() # Convert back to raster object
plot(r$isoscape.rescale, xlab="Longitude", ylab="Latitude", col = custom.palette(16))5 - Assignment to Origin
Now we can attempt an assignment to origin using the function pdRaster(…). This function will produce a posterior probability of origin surface, taking into account the mean ()2Hf values at each cell, as well as our estimate of error. The probability is estimated using a Normal Probability Density function:
\[f(y|\mu_c,\sigma_c) = \frac{1}{\sigma_c \sqrt{2\pi}} e^{-\frac{1}{2}(\frac{y-\mu_c}{\sigma_c})^2 }\] Where \(f(y|\mu_c,\sigma_c)\) represents the probability that a given cell (c) within the \(\delta\)2Hf isoscape represents a potential origin for an individual of unknown origin (y), given the expected mean \(\delta\)2Hf for that cell (\(\mu_c\)) from the calibrated \(\delta\)2Hf isoscape, and the expected standard deviation (\(\sigma_c\)) of \(\delta\)2Hf between individuals growing their feathers at the same locality.
origins <- pdRaster(r, data.frame(duck.d2h), mask = as(breeding.range, 'Spatial'), genplot = F)Warning in check_unknown(unknown, 1): more than n marker + 1 cols in unknown,
assuming IDs in col 1 and marker values in cols 2 to (n+1)
Each resulting probability surfaces is normalized to sum to 1, which we can check:
cellStats(origins[[10]], 'sum') # Posterior probabilities across a single raster layer should sum to 1[1] 1
After applying the above code, we now have a Rasterstack (i.e., series of raster layers with identical extent and resolution; origins) where each layer is’ a probability of origin surface for each respective individual within our duck.d2h object (n = 41). For example, this is what the surface looks like for black duck 3.
plot(origins[[3]], col = custom.palette(8))
plot(st_geometry(breeding.range), add = T)
plot(st_geometry(northamerica), add = T)But what if have have prior knowledge that can inform
5.1 - Prior Probabilities of Origin
Priors allow us to incorporate known information about the system into these assignments. For example, we can include genetic information into the assignment where the genetic similarity between an assigned individual and different breeding regions are used to refine the region of origin. These probabilities are incorporated in the likelihood-based assignment methods through Bayes Rule:
\[f_x = \frac{f(y|\mu_c,\sigma_c)f_{prior}}{\sum_{i}{f(y|\mu_c,\sigma_c)f_{prior}}}\]
In practical terms, these priors are spatially explicit probability surfaces (in raster format) that can be incorporated into the pdRaster(…) function through the argument “prior = surface”. To see how this effects the probability surfaces, we’ll create a theoretical prior. We’ll use the province/state surface to represent different breeding regions and we’ll assign a probability to those regions.
To get these prior surfaces, we can start with polygonal data where each shape has the probability as an attribute/variable. Because we are making this up, we have to mannually assign these values to our province/state shapefile, which is easy enough. The values that I’ve chosen ar
prior <- northamerica.states
prior$prob[prior$name == "Ontario"] <- 0.1
prior$prob[prior$name == "Québec"] <- 0.4
prior$prob[prior$name %in% c("New Brunswick","Nova Scotia","Prince Edward Island")] <- 0.2
prior$prob[prior$name %in% c("Newfoundland and Labrador")] <- 0.3
prior$prob[prior$admin == "United States of America"] <- 0Also, note that this prior is the same for all individuals, in this case. We could develop a RasterStack with a unique prior for each individual. This would probably be the case if we were doing genetics and had genetic information for all sampled indiviudals, but a standard prior could be necessary if we only had information for a given region or cohort.
Once we have our probabilities assigned to the poylgons, we need to rasterize them using the function rasterize(…) which converts the polygons into raster data with the same resolution and extent as the input raster. We still need to mask the resulting raster though.
prior <- rasterize(prior, r$isoscape.rescale[[1]], field = "prob") %>% # rasterize the polygons to match the calibrated isoscape
mask(breeding.range) # mask
plot(prior, col = custom.palette(16))Now we just need to input the prior into pdRaster.
origins.prior <- pdRaster(r, data.frame(duck.d2h), mask = as(breeding.range, 'Spatial'), prior = prior, genplot = F) # assignment with the priorWarning in check_unknown(unknown, 1): more than n marker + 1 cols in unknown,
assuming IDs in col 1 and marker values in cols 2 to (n+1)
If we compare the two outputs, selecting the 12th duck this time, we can see the differences. Notice that while the probability is lower in Ontario, it’s not 0. We gave a lower relative probability compared to Quebec and easern Canada, but the resulting probabilies are not 0.
plot(origins[[12]], col = custom.palette(8))
plot(st_geometry(breeding.range), add = T)
plot(st_geometry(northamerica), add = T)plot(origins.prior[[12]], col = custom.palette(8))
plot(st_geometry(breeding.range), add = T)
plot(st_geometry(northamerica), add = T)If we are only interested in one individual, this surface can give a good idea where they likely originated, but we are rarely ever interested in one individual. The power of this method relies on examining likely origin across a large group of individuals.
To combine these surfaces to examine likely origins for all of the individuals simultaneously. This can be done by summing all individuals as binary surfaces, produced using an odds ratios.
5.2 - Binary surfaces
Lastly, we can convert these surfaces into binary surfaces using a specified odds ratio (or probability threshold). Here we use the function qtlRaster(…) to determine the upper 66% of probable cells, in terms of probability density.
An odds ratio describes the relative strength of an individual originating within a given area compared to them originating outside of that area, given the posterior probability distribution. A commonly used odds ratio is 2:1. When applied, the resulting region of origin contains cells that represent the upper 66% of probability density while the cells outside this region represent the lower 33% of the probability density.
Using this approach, we select the upper 66.66% (i.e., 2:1 odds) of ‘estimated probabilities of origin’ for each individual and code these as 1 (i.e., likely origin) and all others as 0 (i.e., unlikely origin). This ratio can be changed depending on how conservative you want to be. The larger the percentage (e.g., 3:1 or 75%; 4:1 or 80%; 9:1 or 90%) the larger (i.e., more cells), and less precise, each region of likely origin will be, but the more likely that the region contains the true origin of that individual. Previous studies have found that 2:1 or 3:1 represent a good compromise between precision and accuracy.
First, we manually set the odds ratio by assigning a cumulative probability bound to the odds object. This value can be changed to represent any desired odds ratio. For 2:1 odds enter 0.67, for 3:1 odds enter 0.75, for 4:1 odds enter 0.80, for 9:1 odds enter 0.90, for 19:1 enter 0.95.
odds <- 2/3 # select upper 66% of cells
binary.origins <- qtlRaster(origins, threshold = odds, thresholdType = "prob", genplot = F)Comparing this binary surface to the probability surface for the same individual, we can see that the region of highest probability density is now populated by 1’s and all other cells are 0’s.
par(mfrow = c(1,2))
plot(origins[[3]], col = custom.palette(8))
plot(st_geometry(northamerica), add = T)
plot(binary.origins[[3]], col = custom.palette(8))
plot(st_geometry(northamerica), add = T)Now we can sum all of these surfaces to create our final raster surface where the value at each a given represents the number of individuals probabalistically assigned to that cell, given the odds ratio of 2:1.
origins <- calc(binary.origins, sum)6 - Outputs
At this point, you have your completed assignment, but it is important to save 2 things: (1) the raster file and (2) figure.
6.1 - Raster file
For the raster file, we can use the writeRaster(…) function, saving the surface as an .ascii file. Now this file can be easily loaded into R, or any other GIS software. In theory, you could even load this file to modify any exported plots without rerunning the above methods.
writeRaster(origins, "ABDU_origins", format = "ascii", overwrite = TRUE) 6.2 - Figure
Lastly, we can visualize and export a plot of this surface. Rather than using the basic plot(…) function, we can use the function levelplot(…) available from the RasterVis library. This package provides visualization methods for quantitative and categorical raster data.
Similar to ggplot2, you can add additional graphical layers using the layer(sp.polygons(…)) functions (or replace sp.polygons with sp.points(…) and sp.lines(…)).
(p1 <- levelplot(origins, col.regions = custom.palette(16), margin=FALSE,
legend=list(top=list(fun=grid::textGrob("Count", y=0.3, x=1.09)))) +
latticeExtra::layer(sp.polygons(as(northamerica, "Spatial"), size = 0.5)) +
latticeExtra::layer(sp.polygons(as(breeding.range, "Spatial"), fill = NA, alpha = 1, col = "black", lwd = 1.5, lty = 3)))Then we can export this plot using the png(…) function.
png(filename = "ABDU_origins.png", units = "in", width = 6, height = 5, res=1200)
p1
invisible(dev.off())7 - Miscellaneous Tips
- What if you don’t have an error surface?
library What if you just want to use the calibration equation?
Clustering based on surface similarity?
Multiple isotopes?
8 - References
Baldassarre GA (2014) Ducks, geese, and swans of North America. JHU Press.
Bowen GJ, Wassenaar LI, Hobson KA (2005) Global application of stable hydrogen and oxygen isotopes to wildlife forensics. Oecologia 143:337–348. Link
Campbell CJ, Fitzpatrick MC, Vander Zanden HB, Nelson DM (2020) Advancing interpretation of stable isotope assignment maps: comparing and summarizing origins of known-provenance migratory bats. Anim Migr 7:27-41. Link
Hobson KA, Van Wilgenburg SL, Wassenaar LI, Larson K (2012) Linking hydrogen (δ2H) isotopes in feathers and precipitation: sources of variance and consequences for assignment to isoscapes. PLOS ONE 7:e35137. Link
Hobson KA, Wunder MB, Van Wilgenburg SL, et al. (2009) A method for investigating population declines of migratory birds using stable isotopes: origins of harvested Lesser Scaup in North America. PLOS ONE 4:e7915. Link
Kusack JW, Tozer DC, Schummer ML, Hobson KA (2023) Origins of harvested American black ducks: stable isotopes support the flyover hypothesis. J Wildl Manag 87:e22324. Link
Ma C, Vander Zanden HB, Wunder MB, Bowen GJ (2020) assignR: an R package for isotope-based geographic assignment. Methods in Ecol Evol 11:996–1001. Link
Magozzi, S, Bataille, CP, Hobson, KA, et al. (2021) Calibration chain transformation improves the comparability of organic hydrogen and oxygen stable isotope data. Methods Ecol Evol 12:732–747. Link
van Dijk JGB, Meissner W, Klaassen M (2014) Improving provenance studies in migratory birds when using feather hydrogen stable isotopes. J Avian Biol 45:103–108. Link
van Wilgenburg SL, Hobson KA (2011) Combining stable-isotope (δD) and band recovery data to improve probabilistic assignment of migratory birds to origin. Ecological Applications 21:1340–1351. Link
Vander Zanden HB, Wunder MB, Hobson KA, et al. (2014) Contrasting assignment of migratory organisms to geographic origins using long-term versus year-specific precipitation isotope maps. Methods Ecol Evol 5:891-900. Link